Hybrid weighted random forests for classifying very high-dimensional data

نویسندگان

  • Baoxun Xu
  • Joshua Zhexue Huang
  • Graham Williams
  • Yunming Ye
چکیده

Random forests are a popular classification method based on an ensemble of a single type of decision trees from subspaces of data. In the literature, there are many different types of decision tree algorithms, including C4.5, CART, and CHAID. Each type of decision tree algorithm may capture different information and structure. This paper proposes a hybrid weighted random forest algorithm, simultaneously using a feature weighting method and a hybrid forest method to classify very high dimensional data. The hybrid weighted random forest algorithm can effectively reduce subspace size and improve classification performance without increasing the error bound. We conduct a series of experiments on eight high dimensional datasets to compare our method with traditional random forest methods and other classification methods. The results show that our method consistently outperforms these traditional methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees

The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investig...

متن کامل

REGRESSION LEAF FOREST: A FAST AND ACCURATE LEARNING METHOD FOR LARGE & HIGH DIMENSIONAL DATA SETS by SIVANESAN GANESAN

There are a number of learning methods that provide solutions to classification and regression problems, including Linear Regression, Decision Trees, KNN, and SVMs. These methods work well in many applications, but they are challenged for real world problems that are noisy, nonlinear or high dimensional. Furthermore, missing data (e.g., missing historical features of companies in stock data), i...

متن کامل

Random Forests and Adaptive Nearest Neighbors

In this paper we study random forests through their connection with a new framework of adaptive nearest neighbor methods. We first introduce a concept of potential nearest neighbors (k-PNN’s) and show that random forests can be seen as adaptively weighted k-PNN methods. Various aspects of random forests are then studied from this perspective. We investigate the effect of terminal node sizes and...

متن کامل

Random Forests with Missing Values in the Covariates

In Random Forests [2] several trees are constructed from bootstrapor subsamples of the original data. Random Forests have become very popular, e.g., in the fields of genetics and bioinformatics, because they can deal with high-dimensional problems including complex interaction effects. Conditional Inference Forests [8] provide an implementation of Random Forests with unbiased variable selection...

متن کامل

Extensions to Quantile Regression Forests for Very High-Dimensional Data

This paper describes new extensions to the state-of-the-art regression random forests Quantile Regression Forests (QRF) for applications to high dimensional data with thousands of features. We propose a new subspace sampling method that randomly samples a subset of features from two separate feature sets, one containing important features and the other one containing less important features. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012